Spelling correction using probabilistic methods
نویسندگان
چکیده
A probabilistic procedure is suggested for the automatic correction of spelling and typing errors in printed English texts. The heart of the procedure is a probabilistic model for the generation of the garbled word from the correct word. The garbler can delete or insert symbols in the word or substitute one or more symbols by other symbols. An expression is derived for P(Y I X), the probability of generating a garbled word Y from a correct word X. The model is probabilistically consistent. Using the expression for P(Y I X), we can derive an estimate of the correct word from the garbled word Y so as to minimize the average probability of error in the decision. One of the important features of the expression P(Y I X) is that it can be computed recursively. Experiments conducted using the dictionary of 1025 most common English words indicate that the accuracy of correction by this scheme is substantially greater than that which can be obtained by other algorithms especially while dealing with garbled words derived from relatively short words of length less than 6.
منابع مشابه
Design and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملA Probabilistic Model for Spelling Correction
Spelling correctors are widely encountered in computer software applications and they provide the most plausible replacements for words that are presumably incorrect. The paper proposes a spelling correction method starting from the DamerauLevenshtein edit distance and using Bayesian decision theory. The resulted algorithm is tested on a bag of words from the New York Times news articles. 2000 ...
متن کاملDetection of Spelling Errors in Swedish Not Using a Word List En Clair
We investigate how to construct an eecient method for spelling error detection and correction under the prerequisite of using a word list that is encoded and not possible to decode. Our method is probabilistic and the word list is stored as a Bloom lter. In particular we study how to handle compound words and innections in Swedish.
متن کاملArib$@$QALB-2015 Shared Task: A Hybrid Cascade Model for Arabic Spelling Error Detection and Correction
In this paper we present the Arib system for Arabic spelling error detection and correction as part of the second Shared Task on Automatic Arabic Error Correction. Our system contains many components that address various types of spelling error and applies a combination of approaches including rule based, statistical based, and lexicon based in a cascade fashion. We also employed two core model...
متن کاملAssessing what students know: Effects of assessment type on spelling performance and relation to working memory
A central objective of educational assessment is to maximise the accuracy (validity) and consistency (reliability) of the methods used to assess students’ competencies. Different tests, however, often employ different methods of assessing the same domain-specific skills (e.g., spelling). As a result, questions have arisen concerning the legitimacy of using these various modes interchangeably as...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition Letters
دوره 2 شماره
صفحات -
تاریخ انتشار 1984